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Abstract 

Tomasetti and Vogelstein recently proposed that the majority of variation in cancer 
risk among tissues is due to “bad luck,” that is, random mutations arising during DNA 
replication in normal noncancerous stem cells. They generalize this finding to cancer overall, 
claiming that “the stochastic effects of DNA replication appear to be the major contributor 
to cancer in humans.” We show that this conclusion results from a logical fallacy based 
on ignoring the influence of population heterogeneity in correlations exhibited at the level 
of the whole population. Because environmental and genetic factors cannot explain the 
huge differences in cancer rates between different organs, it is wrong to conclude that these 
factors play a minor role in cancer rates. In contrast, we show that one can indeed measure 
huge differences in cancer rates between different organs and, at the same time, observe a 
strong effect of environmental and genetic factors in cancer rates. 


Tomasetti and Vogelstein showed that the lifetime risk of cancers of many different types is 
strongly correlated (0.81) with the total nnmber of divisions of the normal self-renewing cells 
maintaining organ-specific tissue’s homeostasis m- They conclude from this that the majority 
of variation in cancer risk among tissues is due to “bad luck,” that is, random mutations arising 
during DNA replication in normal noncancerous stem cells. Generalizing to cancer causation, 
they claim that “these stochastic influences are in fact the major contributors to cancer overall, 
often more important than either hereditary or external environmental factors.” In a review by 
Couzin-Frankel |1] of Tomasetti and Vogelstein’s article supported by an interview of Tomasetti, 
the above mentioned correlation is interpreted as excluding in large part the role of hereditary 
or environmental factors in the generation of cancers. Couzin-Frankel claims that Tomasetti and 
Vogelstein’s results “explained two-thirds of all cancers.” 

Here, we show that this conclusion is fundamentally flawed, as it rests on neglecting the 
influence of population heterogeneity in correlations exhibited at the level of the whole population. 
Tomasetti and Vogelstein’s results quantify nicely that a large part of the differences in organ- 
specihc cancer risk can be explained by the number of stem cell divisions in different tissues. But 
the logical fallacy is to extrapolate that, because environmental and genetic factors cannot explain 
the huge differences in cancer rates between different organs, then these factors play a minor role 
in cancer rates. In contrast, we show that one can indeed measure huge differences in cancer 
rates between different organs and at the same time observe a strong effect of environmental and 
genetic factors in cancer rates. 

To make our demonstration as clear as possible, we imagine an hypothetical population 
partitioned into two groups. The hrst group exhibits a much lower cancer rate than the second 
group. This may be due to hereditary and environmental factors playing an important role, in 
addition to the number of stem cell divisions in organs. We show that, for any given organ, a 
correlation between lifetime cancer risk and the total number of stem cell divisions at the group 
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level (averaged over the whole population) translates into an equal or higher correlation at the 
level of the whole population. This, however, says nothing about a possible heterogeneity in 
susceptibilities to external factors such as genetics or environment. 

For each of the two groups, we assume that the linear correlation of the type found in Ref. m 
holds: 

) , ( 1 ) 

Cf) = ^(2)^(2) ^ g(2) ^ (2) 

and are the logarithms in base 10 of the lifetime cancer risks for group 1 and group 2, 
respectively, for organ tissue i. and are the logarithms in base 10 of the total numbers 
of divisions of stem cells in group 1 and group 2, respectively, for organ tissue i. and are 
the logarithms in base 10 of the contributions to lifetime cancer risks in the two groups in organ 
tissue i not explained by stem cell divisionsjl] Finally, the coefficients and quantify the 
correlation between and j = 1,2, across all organ tissues. 

The correlation between Cp"* and S'p'* is given by 

/3(^')Var[Rp'^] 

Var[Ci^'Var[^P^] 

We also introduce the covariance between and dehned by 

Cov)^^^^?^] := /3(^')Var[4^'^] . 

The variances of are 

Var)^^'^] := [/3*'-^^]^Var[S'P^] + Var[ep^] . (5) 

We assume that the correlations 

Corr[Cf^\ S'f = Corr[C'f \ F®] := p , (6) 

are the same in both groups, while the incidence of cancers is much higher in the second group. 
How is this possible? To make the example simple, we assume that the rate of divisions of the 
normal self-renewing cells maintaining the homeostasis of a given tissue i is approximately the 
same for all members of our population, and thus the same in both groups. This amounts to 
assuming 

rW = Sf'> := Si. (7) 

To keep our derivation simple, we assume that the logarithm in base 10 of the contribution 
to lifetime cancer risks not explained by stem cell divisions, namely {j = 1,2), has a mean 
value equal to zero and is solely characterised by its variance Vai[e^p]. Then, by dehnition, the 
corresponding lifetime risk of cancers is = 10^^^ , j = 1,2. The mean value of eP is then 
10 \j = 1,2. This shows that the magnitude of lifetime cancer risks not explained by 
the number of stem cell divisions is controlled only by the variance Var[ep'*], for j = 1,2. Then, 
group 2 exhibits many more cancers than group 1 {CP ^ CP) in the following cases: 

^Given the range of lifetime cancer risks from 10“^ to 0.3 and of the total numbers of divisions of stem cells 
from 10® to 10^®, for a linear correlation analysis (Pearson correlation coefficient), Tomasetti and Vogelstein [11] 
used these logarithmic variables (see their supplementary materials). The relevance of the use of log-variables is 
further suggested by their definition of the “extra risk score” m- 
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(a) S> (much larger sensitivity to stem cell divisions) while Var[e^^^] and Var[e^^^] 
remain of the same order of magnitude; 

(b) Var[e|^^] 3> Var[e|^^], while the sensitivities and to stem cell divisions remain 
similar; 

(c) (3^“^'^ :» /^d) and Var[ef^] S> Var[e-^^]. 

Consider the identity linking Corr[C'f'^\ and Var[ep^] versus derived from ([3]) and ([5]), 


Corr[Cp\ 


Var[e--^^] 

wWy^] 


( 8 ) 


Case (a) leads to Corr)^^^ S'^^] <C Corr)^^^ in contradiction with our assumption ([2]). 
Case (b) leads to Corr[C,^^\ S> Corr)^^^^ ], again in contradiction with ([6]). In fact, 

expression ([8]) implies that Corr[C'j^''^\ remains unchanged when /ddO is increased arbitrarily 
while Var[ep^] is also increased proportionally to (/3d^)^, since Var[S'j] is assumed to be the same 
in the two groups. Thus, the assumption ([S]) together with the identity (jS]) imposes case (c) as 
the only general possibility for 

The analysis of Tomasetti and Vogelstein [n] does not distinguish between groups exhibiting 
different cancer rates. This amounts to considering the total population of the two groups put 
together. Then, in our hypothetical population, Tomasetti and Vogelstein would observe 

Cf) + Cf) = [/3d) + /3(2)]5, + ef ^ + ef) , (9) 


using our assumption ([7]). In this meta-population, the correlation studied by Tomasetti and 
Vogelstein [11] is that between and S'*: 


Corr[C(') + Cf\5,] = 


Covicf^^.l + Covicf 


+ Var[Cf + 2/3(d^(2)Var[S'i]) Var[5i] 


From (El), (jl]), (j6|) and (Ej), we deduce 

CoviCp^ = p^VariCpVariF,] , 
which we insert in flT0|) to obtain 

\/var|C<‘>| + \/va.r[cf ] 


Corr[C<‘>+Cf,S,]=p 


By ([5|), we have 
which implies 
using definition I 


^VarfC^ + Varicf^] + 2/3(d^(2)Var[5*] 

Var[cf^'^] > [/3d)]2Var[S'i] , 

Corr[Cf ^ -t- Cf\Si\ > Goii[C\^\ S'J , j = 1 or 2 , 


( 10 ) 


( 11 ) 


( 12 ) 

(13) 

(14) 
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The inequality fll4p . which recovers a standard result in statistics, constitutes our main lever 
to falsify Tomasetti and Vogelstein’s claim: the correlation between stem cell divisions and cancer 
risks at the level of the total population is in fact no lower than that found at the individual group 
level. In plain words, a strong correlation at the population level over all group types is blind to 
the existence of strong differences in group susceptibilities to cancer associated with other (i.e. 
environmental or hereditary) factors. In our hypothetical population, one group shows a much 
higher cancer rate than the other, in the presence of a strong correlation between number of stem 
cell divisions and total cancer rate, but this does not allow one to conclude that the total number 
of stem cell divisions is the dominant factor responsible for cancer in both groups (hence making 
cancer “bad luck”). On the contrary, this result is compatible with a possibly strong influence 
from other environmental and genetic factors, here embodied in the variable as well as the 
possible dependence of on the same factors. 

We stress that our conclusion remains robust when relaxing the simple assumptions used in 
our hypothetical population. For instance, the demonstration generalizes straightforwardly to 
more than two groups and even to a continuum. The condition (|6]) of equal correlations within 
the two groups can be generalized to different values. And our argument and conclusion remain 
valid if it would appear that the rate of divisions of the normal self-renewal stem cells may vary 
between groups. 

A part of the conclusion that Couzin-Frankel |1] and Tomasetti and Vogelstein’s m draw 
is thus unwarranted: Tomasetti and Vogelstein’s analysis does not allow one to conclude that 
the majority of cancers is due to unpreventable “bad luck.” We have just demonstrated that 
the existence of possibly strong differences in susceptibility to cancers, for instance due to en¬ 
vironmental and genetic factors, has no effect on Tomasetti and Vogelstein’s result that a large 
fraction of the variation in cancer risk among tissues, that is, differences in cancer incidence 
among different organs, can be explained by the number of stem cell divisions. Tomasetti and 
Vogelstein’s hndings point naturally to the prevalence of mutations during replications. This can 
explain why certain organs are more affected by cancer than others, but does not address the 
question of why certain populations or individuals are more affected by cancer than others. 

We have demonstrated that the coexistence of several populations with very different cancer 
rates, for instance due to environmental and genetic causes, is compatible with the empirical 
evidence of a strong correlation between the total number of cell divisions and cancer risks 
m- One may ask whether our hypothetical population made of two groups with 
and Var[ej-^^] S> Var[ej^^^] (case (c)) has anything to do with reality? The answer is empirical 
and requires to extend Tomasetti and Vogelstein’s analysis to different cohorts under various 
environmental stressors as in the Framingham Heart Study of NIH |H], the China-Cornell-Oxford 
Project [3] and others [Tl|2l|8l|9] . Case (c) corresponds to a consistently large correlation between 
number of stem cell divisions and cancer risk and provides an interesting testable hypothesis, 
namely that controllable environmental factors and/or genetic traits impact both the cancer risks 
related to stem cell divisions and those that seem unrelated to stem cell divisions. This requires 
to study conditional correlations, thus extending the unconditional correlation study of Tomasetti 
and Vogelstein (since no condition on separate groups or cohorts is imposed in their study). 

Indications of strong environmental factors are actually observed in hgure 1 of Ref. m- (i) 
lifetime lung cancer risk is multiplied by 12 by smoking; (ii) lifetime head and neck cancer risk is 
multiplied by 6 after Human papillomavirus contamination; (iii) Hepatocellular carcinoma risk is 
multiplied by 10 after hepatitis C virus contamination; (iv) colorectal cancer risk is multiplied by 
12 in the presence of familial adenomatous polyposis. A possible source of confusion may be due 
to the existence of more than 200 different kinds of cancers according to present taxonomy, with 
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many more subtypes coming in month by month. For the well-known cancer types, epidemiology 
shows a strong link between environmental and life style factors. For the many other so-called 
sporadic cancers, epidemiological studies are much less advanced. We hope that the present note 
will help refocus on the importance of environmental and predisposing genetic factors |3ll^[71IT0] 
and not miss the forest for the trees. 

We acknowledge very helpful feedbacks from Thomas Cerny, Jean-Yves Henry, and Christine 
Sadeghi. 
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